2012 US Presidential Election Campaign Contributions – Part 2: by Andrew Lavers

Load the data from campaign.csv and look at basic stats

IMPORTANT: This is a 5% sample from the complete data set. This should be sufficient to represent the full data set but actual totals of contributions will not represent the totals. Any totals reported here have NOT been adjusted for sampling.

The data to be loaded is in file campaign_5.csv. This is a munged data set based on the presidential campaign ALL states data set. The munging is documented in the separate AndrewLaversCampainMunge.html produced from AndrewLaversCampainMunge.Rmd

## [1] "Analyzing  259869 rows from /Users/alavers/Documents/Udacity/Data Analysis with R/P3/campaign_5.csv"

Univariate Plots Section

Contribution Amounts

Investigate contribution amounts and establish a category.

At first thought, contribution amount is a continuous variable, but in reality there are very distinct buckets that show in the chart as spikes at 25, 50 100, 250, 500, 1000, 1500, 2000, 2500.

Clearly there are many small contributions, which is confirmed by the following basic statistics of the contribution amount.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.01   25.00   50.00  180.10  100.00 2500.00

Based on the above chart and some experimentation encode the distinct buckets as a new category variable named contb_receipt_amt_category, with the following levels.

## [1] "(0,25]"      "(25,50]"     "(50,100]"    "(100,250]"   "(250,1000]" 
## [6] "(1000,2500]"

After some experimentation the above buckets produced the expected falling counts by increase in contribution amount. As is expected, the number of contributors declines as the contribution amount increases which reflects the overall country wealth demographics.

Contributions by Candidate

In the above chart, the two final presidential candidates for the general election are orders of magnitude greater than the primary candidates, so further analysis of the primary candidates is not likely to be that interesting.

Contributions over time

Contributions accelerate approaching the early November election date, as can be seen in the above chart. It will be interesting to see the different pace of Republican versus Democratic contributions.

Contributions by State

The above counts by state show a few large states as frequent contributors.

Contributions by Party

The counts by party above are substantially different. We should later investigate the relative size and number of contribution by party.

Contributions by Swing State

While the above chart shows that there are substantial differences in contributions when comparing Swing States to Non-swing states, the overall state populations are most likely masking the effect. Very populous states such as NY, CA and TX are not swing states.

Univariate Analysis

What is the structure of your dataset?

The data set to analyze has individual contributions to 2012 presidential candidates. Each contribution has:

  • Contributor - Name and address

  • Contributor - Occupation (Not useful because it isn’t normalized. Many different equivalent. There may be a few interesting high-frequency common items like LAWYER, PHYSICIAN)

  • Contribution - Date, Year month, amount

  • Commitee and Candidate - Committee ID, Candidate ID and Candidate Name

  • Election type - P2012 for primaries and G2012 for general election

  • Form type and transaction id - Not used in this analysis

  • Party Affiliation - Republican or Democratic

What is/are the main feature(s) of interest in your dataset?

  • Individual contribution - Each individual contribution is represented as a row so the contribution counts

  • Candidate (cand_nm) - Obviously the presidential election is about the person that will be president

  • State (contbr_st) - In presidential elections, states vote with Electoral college ballots, so votes within a state matter. See for example http://en.Wikipedia.org/wiki/Electoral_College_%28United_States%29. The state identifiers in this data set includes identifiers for non voting territorial possessions (e.g Guam, US Virgin Islands)

  • Contribution Amount (contb_receipt_amt) - The is the dollar amount of the contribution is the most interesting item to analyze. This part of the data set was limited to include contribution under the 2012 contribution limit of $2500. There varying reports of whether contributions under $200 must be reported. About half the contributions in this data set are $50 and under. We can analyze the differences and totals of contributions to reach broad conclusions, but these will not represent the full population of contributions.

What other features in the dataset do you think will help support your investigation into your feature(s) of interest?

  • Date (contb_receipt_dt, contb_receipt_ym) - Looking at contributions over time may prove interesting. The Republicans had a large primary field while the Democrats presented a single candidate, the incumbent president. Time may show us when the primary candidates dropped out.

  • Occupation (contbr_occupation) - While this may be interesting, the values are not normalized and these cant be effectively compared across all occupations. There are a few discrete occupations that may be interesting such as LAWER, PHYSICIAN, TEACHER.

  • Employer (conbr_employer) - This will be very interesting to find employers with many contributors. However this may be better left for other type of analysis – visualization with plots may not effectively tease out isolated hot spots. One category appears here “RETIRED”, “HOMEMAKER”, “UNEMPLOYED”, “NOT EMPLOYED” that may be useful for broad categories but there are many “INFORMATION REQUESTED” which indicates missing data.

Did you create any new variables from existing variables in the dataset?

  • Party (party) - Party helps pool the early contributions that went to multiple Republican primary candidates, into two main buckets Republican vs. Democrat. The party value was determined from Wikipedia articles and merged onto the main data set.

  • Swing State (swing_st) - In USA presidential elections, the president is actually elected by an Electoral college of state representatives who cast a pre-allocated number of electoral votes on behalf of their state. In many states, the full block of electoral votes must go to the winning candidate in that state. If these states have almost equal Democratic and Republican support, they can be the one state that “swings” the election. In 2012 there were 9 swing states as tracked by the New York Times

  • Categorized Contribution Amount (contb_receipt_amt_category) - The charts show distinct levels of contribution at these breaks: $25, $50, $100, $250, $1000, and $2500. A categorized variable was added to facilitate analysis.

  • Receipt year month (contb_receipt_ym) - The Year month of the receipt date to simplify trend plotting.

Of the features you investigated, were there any unusual distributions? Did you perform any operations on the data to tidy, adjust, or change the form of the data? If so, why did you do this?

For more detail, see the data munging documented file:AndrewLaversCampaignMunge.html

The main operations performed where:

  • Remove negative values that represent returns. Strictly we should also find the original contribution and remove that as well. Finding the original is made more difficult by sampling. But since returns make up about 3% of the data set, this was left for a future exercise. Negative return values distort the means and medians

  • Limit to contributions of $2500. This eliminated a few very large party committee transfers, many smaller corporate contributions and leaves a more consistent data set of individual contributions only.

  • Categorized the contribution amount into buckets

  • Eliminate “states” that are territorial possessions etc., that do not form the Electoral College.

  • Limit to dates after 1/1/2011

  • Eliminate the Green Party because there are very few contributions

Bivariate Plots Section

As can be seen in the above chart, there is not much difference in individual contribution amounts in Swing or Non-swing states. The median Democratic Swing State contribution of 45 differs slightly from the 50 we find in Non-swing States. The Republican median of 100 is unchanged.

A SQRT scale is needed in the above chart to show the distribution. Clearly the populous states such as NY, CA, TX, FL, and IL dominate in election contributions.

A cursory review of US State populations suggests that the above chart corresponds approximately to state populations. Virginia, a swing state, may be an exception being 7th in contributions but only 12th in population.

Thus swing states don’t seem to make an obvious difference.

The above chart shows significantly higher median contribution amounts for the primary candidates. Rick Perry’s median of 1500 is similar to that of Timothy Pawlenty and much higher than the rest of the field. Pawlenty has far fewer overall contributions so this similarity here may be deceiving. The difference in distribution between Mitt Romney and Barack Obama can clearly be seen.

This result above is interesting to me because I have always wondered how much money was “wasted” during primary elections. In the chart you can see that both Mitt Romney and Barack Obama received orders of magnitude more that the other primary-only candidate.

In the above chart contributions accelerate as the election approaches, and stop immediately after. There is distinctly earlier contributions to the Republican Party between 8/20111 and 4/2012. The Democratic Party ends very strong with noticeably higher contributions.

In the months after the primaries, total contributions are similar until September when the Democratic Party jumps ahead by $2million. This lead is then maintained. The Democratic Party convention was held September 3-6, which could be a trigger for this additional contribution. Note that because of sampling at 5% this gain would be about $40 million.

The Democratic Party leads dramatically in number of contributions after the end of the Republican Primaries- more than 100,000 in this 5 sample

## geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.

IN the above chart the Democratic acceleration is clear with the ratio of cumulative contribution riding rapidly to more than 4:1.

Bivariate Analysis

Talk about some of the relationships you observed in this part of the investigation. How did the feature(s) of interest vary with other features in the dataset?

A clear relationship is between contribution amounts and contribution size which varies by party. The Democratic Party is the clear leader by contribution counts and the Republican party leads in contribution size. As may be expected contributions accelerate closer to the election. An interesting note is the so-called “convention bump”. The press and attention around the national convention attracts more interest and contributions. This bump in September 2012 can be clearly seen for the Democratic Party.

Did you observe any interesting relationships between the other features (not the main feature(s) of interest)?

There seem to be some similarities between the primary candidates who ultimately dropped out in the primaries. These can be seen in the box plot by candidate above. In particular the contribution size median is much higher suggesting that in order to progress in a primary election, the candidate must be supported by large contributions. In the next section we will explore this further and one of the final plots will show this relationship.

What was the strongest relationship you found?

The strongest relationship is between party and contribution count and size. Both parties raised similar total amounts.
The Democratic Party contribution count was about four times that of the Republican Party.

Multivariate Plots Section

Contribution by Months

The above scatter plot with very low alpha value of 0.01 reveals significant difference between Republican and Democratic donation patterns over time. Despite there only being Republicans in the primaries, these comparisons don’t reveal any significant difference over time. The rate that donations increased as the election approached is about the same. What is strikingly different here is the number of donations from Democrats. The much darker plot on the Democratic side indicates many more contributions than the Republicans. A second key difference is the size of donations by Republications in the last months, that is indicated by the much darker area top right at the $2,500 level. The contribution categories (breaks) are also clearly visible

Cumulative Contribution Amounts

Here we are interested in the rate of contributions, how fast they grew, when they started and when they ended up. We will omit Mitt Romney and Barack Obama because their totals are much greater than the other candidates.

The faceted plot above is interesting because it shows very different shapes for non-starters and those that remained competitive. We will improve this for the for the final plots

Contributions by State

There must be some good information in the state and geography data, but plots like this are not very meaningful probably because the variation in state population, and hence the number of contributions per state, dominates. Perhaps it would be better to focus on means and/medians.

This chart overwhelmingly shows the broad Democratic base. Only in Utah do Republican Party contributions outnumber those for the Democratic Party. While there is substantial variation in mean contribution size by state, the median is the same in about 80% of all states.

Occupations

We will take a look at occupations. And see if there is some relationship. There are 31100 different occupations listed so let’s look at the top occupations by frequency. First a chart of counts, then a chart of amounts.

The above chart shows the differences in percentage of contributions by occupations. Positive percentages are Republican - Negative percentage are Democratic. The occupation order is Democratic favoring to Republican favoring.

The above chart shows the differences in $ amounts of contributions by occupations. Positive are Republican - Negative are Democratic. The occupation order is the same as the previous chart, Democratic favoring to Republican favoring by count.

A few observations:

  • The largest contribution total difference are from Professors, Attorneys, Retired, and Homemakers
  • Fewer occupations dominate the Republican contributes while the Democratic occupations are more varied. 18 of the 30 occupations have more dollars contributed to the Democratic Party.
  • Comparing to the previous chart, the crossover from Democratic to Republican is much higher, which reflects the much larger contribution amount prevalent with the Republican party.

Compare Occupations by State

Multivariate Analysis

Talk about some of the relationships you observed in this part of the investigation. Were there features that strengthened each other in terms of looking at your feature(s) of interest?

The plots in this sectioned strengthened the idea that the size and number of contributions are very different for the Republican vs. the Democratic party.

The final presidential candidate totals tower over the other candidates. For example Barack Obama total receipts are 5.6 times that of the total of all Republican candidates excluding Mitt Romney.

Were there any interesting or surprising interactions between features?

Because swing states are pivotal in the election, I expected to find evidence of this. I quickly realized, however, that this can only be analyzed in the context of the state itself. States vary greatly in population and in per capital income. Perhaps these factors could be used in a future study to normalize the comparisons between states.


Final Plots and Summary

Plot ONE - How did fundraising progress for the Primary candidates?

This chart explores the growth – cumulative contributions – by month and year for the Republican participants in the presidential primary and omits the general election contenders, Mitt Romney and Barack Obama. The line width indicates the number (rate) of contributions. The original chart was hard to follow with colors - it took quite some time to figure out how to plot the names close to the lines.

  • At the lowest level, contributions for Roemer, McColter, and Johnson never took off despite starting early in the race.

  • Pawlenty, Bachman and Huntsman made some headway but they were never able to gain the sudden growth of the leading candidates.

  • Rick Perry’s rapid rise in contribution amounts from fewer contributors can be seen form the relatively thin line. This suggests he may have been fueled by wealthy contributors but unable to continue that into a sustainable contribution base as can be seen with the other long-lived candidates

  • Ginrich and Santorum started later but grew steadily, leveling out a little earlier than Ron Paul.

Plot TWO - Comparison of contribution size

## Using party, contb_receipt_amt_category as id variables

From the left side chart above the difference in contribution size is very clear. The Republican Party received 52 percent of all funds from contributions in the range $1,000 to $2,500. By contrast the Democratic Party received only 23.3 percent of all contributions from that category. In addition the Democratic Party received more than 50% of all funds from contributions of $250 and under.

From the right side chart, which shows the counts, the broad base of the Democratic Party is clear. About 60% (35% + 25%) of Democratic Contributions are $50 or under.

Plot THREE- Democratic vs. Republican Contribution Pattern

The total contributions for both parties were very similar with the Republicans raising 100.4 % of the Democratic total.

Earlier exploratory charts suggested that the number and size of contributions are quite different between the two parties. The chart below extends this by exploring the timing, size and count of contributions for the two parties. Each contribution is plotted as a point in the traditional colors of the parties - Blue for the Democrats and Red for the Republicans. Choosing a very low alpha allows the distribution of the contributions to show through. Setting the size to be the contribution amount ensures that the color density overall reflects the relative value of each contribution. The horizontal bands are the size categories of the contribution. Time flows left to right,

This chart is quite striking and describes the differences between the Democratic and Republican fund raising. There are a few clear patterns:

  • Republicans make more donations above $250 and clearly lead the $1000 - $2500 category.

  • Democratic contributions start earlier and seem more dominant in the earlier stages except perhaps for December 2011 and January 2012. This earlier start is curious since there was no primary competition for the Democratic nomination.

  • The number of contributions in the few months before the election in November, is dominated by Democrats, as can be seen by the intense blue in May, June and July 2012.

  • In October, one month before the election, Republicans seem to suddenly increase the number and size of their contributions as can be seen from the intense red.


Reflection

This analysis took longer than expected because I felt compelled too look at many angles. In doing so I learned many aspects of R and ggplot that I otherwise wouldn’t have.

Sampling has precluded analysis by employer or individual - It may be interesting to see if there are individuals that give to multiple candidates or employers with a concentration of contributions.

Further analysis for swing states may be interesting, relating contributions to population and income.

In this election year, the Citizen’s United decision cleared the way for much money to be contributed by corporations to PAC’s and Super PACS (political action committee’s), so this analysis only covers a small portion of the money involved in presidential elections.

Overall this analysis has reinforced the idea that the Democratic Party is more broadly based relying on smaller contributions from a broader base than the Republican Party.